Plastome phylogenomics and morphological traits analyses provide new insights into the phylogenetic position, species delimitation and speciation of Triplostegia (Caprifoliaceae)

Background The genus Triplostegia contains two recognized species, T. glandulifera and T. grandiflora, but its phylogenetic position and species delimitation remain controversial. In this study, we assembled plastid genomes and nuclear ribosomal DNA (nrDNA) cistrons sampled from 22 wild Triplostegia individuals, each from a separate population, and examined these with 11 recently published Triplostegia plastomes. Morphological traits were measured from herbarium specimens and wild material, and ecological niche models were constructed. Results Triplostegia is a monophyletic genus within the subfamily Dipsacoideae comprising three monophyletic species, T. glandulifera, T. grandiflora, and an unrecognized species Triplostegia sp. A, which occupies much higher altitude than the other two. The new species had previously been misidentified as T. glandulifera, but differs in taproot, leaf, and other characters. Triplotegia is an old genus, with stem age 39.96 Ma, and within it T. glandulifera diverged 7.94 Ma. Triplostegia grandiflora and sp. A diverged 1.05 Ma, perhaps in response to Quaternary climate fluctuations. Niche overlap between Triplostegia species was positively correlated with their phylogenetic relatedness. Conclusions Our results provide new insights into the species delimitation of Triplostegia, and indicate that a taxonomic revision of Triplostegia is needed. We also identified that either rpoB-trnC or ycf1 could serve as a DNA barcode for Triplostegia. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-023-04663-4.

The plastid genome in almost all land plants exhibits a highly conserved quadripartite structure [28], generally ranging from 120 to 160 kb in size and containing 110-130 distinct genes including ~ 80 protein-coding genes, 30 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes [29,30].Plastid genomes are predominantly maternally inherited in plants [31].Due to their high copy number within cells, plastid genomes can be sequenced, assembled, and annotated more easily and cost-effectively than nuclear genomes [32,33], aiding their widespread use in elucidating the evolutionary history of green plants [34][35][36].Moreover, plastid genomes have emerged as super DNA barcodes or ultra-barcodes [37,38], containing a higher number of informative sites and exhibiting greater discriminatory power than standard plant DNA barcodes [39,40].Plastid genomes have recently been used in discovering cryptic species and screening taxon-specific DNA barcodes for particular plant lineages [19,41,42].
The Hengduan Mountains Region (HDM), also known as the Mountains of Southwest China, is recognized as one of the world's biodiversity hotspots [43,44].It is known for harboring the richest temperate alpine flora in the world [45], and as a center of diversity for numerous plant lineages [46][47][48][49].Its topography is characterized by a series of north-south oriented alpine mountain ranges separated by deep river gorges [47], which act as genetic barriers for some plant taxa [50,51] and therefore have contributed to the high species diversity in this region.
The genus Triplostegia Wall.ex DC. comprises two traditionally recognized species: T. grandiflora Gagnep.(1901) is confined to the HDM in north Yunnan and West Sichuan, whereas T. glandulifera Wall.ex DC. ( 1830) is widely distributed in the mountains of southwestern and central China, extending to Taiwan, Bhutan and Nepal [52] (Fig. 1).Recent evidence placed Triplostegia within subfamily Dipsacoideae of Caprifoliaceae [53][54][55], but its affinities have long been controversial [56][57][58].Morphologically, Triplostegia grandiflora differs from T. glandulifera in its sessile (not petiolate) leaves, longer corolla and more elongated inflorescence branch [52], but recent studies based on molecular and morphological evidence have proposed merging T. grandiflora into T. glandulifera [59,60].However, our own investigations in the southern region of the HDM have revealed a third taxonomic entity, here termed Triplostegia sp.A, which often occurs sympatrically with T. grandiflora, but differs from both recognized species in glabrous and slender lateral taproot, petiolate and marginal serrated leaves, and corollas usually 1-2 mm in length.Furthermore, whereas T. grandiflora exists in Pinus yunnanensis and P. armandii forests up to 2066-3128 m, Triplostegia sp.A has a larger elevation range, 2651-3954 m according to our fieldwork, occurring in Pinus, Quercus, Abies, and Picea forests plus roadsides, riversides, and alpine meadows.
In this study, we performed genome skimming on 22 individuals collected from 14 populations of Triplostegia sp.A and eight populations of T. grandiflora, all from the southern HDM within the northwest region of the Yunnan Province.To these were added 11 recently published Triplostegia plastomes covering the entire distribution range of Triplostegia [59].Furthermore, we recorded morphological and functional traits of Triplostegia species from herbarium specimens and wild plants.Our main objectives were to address the following questions.
(1) What is the phylogenetic position of Triplostegia?(2) How many distinct species exist within Triplostegia?(3) When did diversification occur among Triplostegia species?(4) Are there any highly variable regions in the plastid genome that could be used as taxon-specific DNA barcodes for discriminating Triplostegia species?(5) Did any geographical features play a role promoting diversification within Triplostegia?

Taxon sampling
One individual was randomly selected from each of 8 and 14 populations of T. grandiflora and Triplostegia sp.A in northwest Yunnan Province, respectively (Table S1; Fig. 6).Healthy and fresh leaves were collected and immediately dried using silica gel.Vouchers were deposited in the Herbarium of Nanchang University.In addition, 11 sequences of Triplostegia were downloaded from the NCBI Sequence Read Archive (SRA) for analysis, comprising seven samples of T. glandulifera, one of T. grandiflora, and three of the unrecognized species Triplostegia sp.A (Table S1).

DNA isolation and sequencing
Total genomic DNA was extracted from silica-gel-dried leaves using a modified CTAB method [61].The DNA samples were then sheared into fragments and used to construct 500 bp libraries by the Molecular Biology Experiment Center, Germplasm Bank of Wild Species in Southwest China, following the manufacturer's manual (Illumina, San Diego, CA, USA).Paired-end sequencing of 150 bp was performed on an Illumina HiSeq 2500 platform.

Genome comparison and structural analysis
The plastid genomes were aligned and visualized using mVISTA in Shuffle-LAGAN mode [68].To investigate potential rearrangements in the plastid genomes, multiple sequence alignment was performed using MAUVE [69].Comparisons of boundaries between the singlecopy regions and the inverted repeat (IR) regions among the plastid genomes were performed using IRscope [70].

Nucleotide diversity and genetic differentiation analysis
We used nucleotide diversity (π) to assess the levels of plastid genomic divergence within Triplostegia, and identify highly variable plastid DNA regions, using DNAsp v6.0 [71], employing a window length of 600 bp and a step size of 200 bp.
Genetic differentiation (F ST ) and gene flow (N m ) among species, as well as within-species genetic diversity, were also estimated using DNAsp v6.0 [71], with all samples belonging to each taxon being considered a population.

Phylogenetic analysis
Phylogenetic relationships among the 33 Triplostegia samples (Table S1) were examined using Maximum-likelihood analysis (ML) and Bayesian inference (BI), based on three datasets: the complete plastid genomes, plastid protein-coding sequences (CDS), and nrDNA sequences.As outgroups we used Pterocephalus hookeri (C.B.Clarke) Airy Shaw & M.L.Green, Dipsacus asper Wall.ex DC., Scabiosa tschiliensis Grüning, Kolkwitzia amabilis Graebn., and Patrinia heterophylla Bunge, representing genera closely related to Triplostogia [59].Three samples were randomly selected from each of the three clades formed by the 33 samples of Triplostegia, and used to determine the position of the genus in the phylogeny of Dipsacales via ML and BI analyses.For this, 57 complete plastid genomes were obtained, including the above outgroups and covering of the recognized families within the order (Table S2).Sesamum indicum L., Mentha spicata L., Pittosporum kerrii Craib, and Apium graveolens L. were chosen as outgroups to Dipsacales based on previous studies [53][54][55].

Species discrimination analysis
We assessed the effectiveness of standard plant DNA barcodes, including rbcL, matK, and ITS, and the barcode ycf1 suggested by Dong et al. (2015) [77], plus the highly variable plastid DNA regions of Triplostegia and their combinations, in discriminating Triplostegia species using tree-based methods.ML trees for each marker were constructed using RAxML with the same settings as previously described.A species was considered as being correctly resolved when all the individuals of the same species formed a monophyletic group with >70% bootstrap support [78].The standard DNA barcode trnH-psbA was not included in the analysis due to its insufficient number of informative sites for species discrimination.In addition, we generated Neighbour-Joining (NJ) trees using MEGA v10 [79] based on the highly variable plastid DNA regions and the ITS region, using the P-distance model with 1000 bootstrap replicates.
In addition to the tree-based analyses, we also conducted distance-based analyses following Hollingsworth et al. ( 2009) [22].Pairwise interspecific and intraspecific genetic distances were calculated using the Kimura 2-parameter (k2p) mode using MEGA v10 [79].A species was considered to be successfully discriminated if its minimum interspecific k2p distance involving this species was greater than its maximum intraspecific k2p distance.

Divergence time estimation
We obtained plastome sequences from GenBank (Table S3) for a total of 27 species of Dipsacales, two species of Apiales (Apium graveolens L. and Pittosporum kerrii Craib), and two species of Lamiales (Sesamum indicum L. and Mentha spicata L.) for the purpose of estimating divergence times.ModelTest analysis indicated that the GTR + I + G nucleotide substitution model performed the best (Table S4).We used BEAST v2.6.6 [80] to estimate divergence times under a relaxed lognormal clock and GTR + I + G nucleotide substitution model.Markov Chain Monte Carlo (MCMC) searches were performed for 500,000,000 generations, sampling every 25,000 generations.The tree prior was specified as a Yule process.Tracer v.1.5[81] was used to assess chain convergence and to ensure that the effective sample sizes (ESS) were greater than 200.The maximum clade credibility (MCC) tree with median heights was computed using TreeAnnotator v2.6.6.Four calibration points were used: (1) the crown age of Dipsacales was set to 103 million years ago (Ma), with a normal prior (mean = 103 Ma, SD = 1.0), based on previous studies [82,83]; (2) the earliest fossil record of Viburnum from the late Paleocene to early Eocene [84,85] was used to calibrate the crown group of Adoxaceae, with lognormal prior (mean = 0, SD = 1.0, offset = 56 Ma), following Moore and Donoghue (2007) [86]; (3) the setting of divergence time between Weigela and its sister group Diervilla, with lognormal prior (mean = 0, SD = 1.0, offset = 23 Ma), following Wang et al., (2015) [83]; and (4) the fossil fruits of Diplodipelta (36 Ma) [87] were used to calibrate the stem age of Dipelta, with lognormal prior (lognormal mean = 0, SD = 1.0, offset = 36 Ma), following Wang et al., (2015) [83].

Morphological and functional traits analyses
We collected morphological trait data of Triplostegia species by measuring specimens across their distribution range.For 63, 27, and 55 specimens of T. glandulifera, T. grandiflora, and Triplostegia sp.A respectively, we measured 10 morphological traits, including plant height, taproot length and width, leaf length and width, petiole length, leaf fission depth, corolla length, fruit length and width.These traits were chosen because most of them showed disparities among the Triplostegia species according to our observations.In addition, respectively 121 and 149 individuals from 8 and 14 populations of T. grandiflora, and Triplostegia sp.A (identified by S-L Tan) where they co-occur in northwest Yunnan (Fig. 6), were examined for eight morphological and functional traits: plant height, leaf chlorophyll content, leaf area, leaf thickness, leaf dry mass, specific leaf area (SLA), corolla length, and corolla width, following previously applied protocols [88][89][90].We conducted Principal component analysis (PCA) based on these morphological traits.Kruskal-Wallis tests and pairwise Wilcoxon rank sum tests were used to assess the differences in each trait among the three Triplostegia taxa.All statistical analyses were conducted using R version 4.1.3[91].

Species distribution modelling
We used MaxEnt v.3.4.1 [92] to assess the suitable climate envelopes of T. glandulifera, T. grandiflora, and Triplostegia sp.A across the past, present, and future periods.Species occurrence records were obtained from the Chinese Virtual Herbarium (http:// www.cvh.ac.cn/), the Global Biodiversity Information Facility (https:// www.gbif.org/), plus our own field collections.To ensure data quality, we refined the occurrence records following the criteria described by Qiu, et al. (2023) [93] by removing: 1) duplicate records, 2) records lacking spatial coordinates or specific locations, 3) specimens with identification errors, and 4) unreliable records that were located in the city or bodies of water.To reduce the effect of spatial autocorrelation and the consequent overfitting, occurrence records within five kilometers of another were filtered out.Ultimately, our final dataset for species distribution modelling consisted of 64, 30, and 67 occurrence records for T. glandulifera, T. grandiflora, and Triplostegia sp.A, respectively (Fig. 1).
Nineteen bioclimatic variables were downloaded from WorldClim v1.4 for the Last Interglacial (LIG; 120,000-140,000 years ago), the Last Glacial Maximum (LGM; 22,000 years ago), and the Mid-Holocene (MH; 6000 years ago) periods, and from WorldClim v2.1 (https:// www.world clim.org/) for the present  and future (2090: average of 2081-2100) (Table S5), with a resolution of 2.5 arc-minute (approximately 5 km 2 ).To provide a conservative and a comparatively larger estimate of species distribution change under future climate conditions, we used two Shared Socioeconomic Pathways (SSPs) for future climatic conditions: SSP2-4.5 (moderate climate change) and SSP5-8.5 (pessimistic climate change) from the CMIP6 (BCC-CSM2-MR) climate model [94].To avoid multicollinearity of variables, we performed a Pearson's correlation test for the 19 bioclimatic variables for each species, and for any pair of variables with Pearson's r > 0.8, the variable with the higher percentage contribution was retained.The Area Under Receiver-Operating Characteristic (ROC) Curve (AUC) values were used to evaluate the accuracy of the species distribution models [95].The AUC values range from 0.5 to 1, which are categorized as failing (0.5-0.6), poor (0.6-0.7), fair (0.7-0.8), good (0.8-0.9), and excellent (0.9-1) [96].The Jackknife analysis was used to determine the relative significance of each bioclimatic variable [97].To determine the potential distribution of each species, we reclassified the Max-Ent output file using the 10-percentile training presence logistic threshold value (10TPL) [98,99].We used the SDM toolbox v2.4 in ArcGIS 10.2 to calculate the suitable area changes between different periods.
To quantify niche similarity between species, we used the software ENMTools v1.3 [100] to estimate the niche overlap among Triplostegia species using two metrics: Schoener's D [101] and Warren's I [102].Both metrics range from 0 to 1, with values closer to 1 indicating a higher degree of niche overlap between the species.

Comparative analysis of plastid genome structures
According to mVISTA (Fig. S2) and MAUVE (Fig. S3) analysis, the plastid genome structures were highly conserved in Triplostegia, with no inversion or rearrangement detected.The LSC and SSC regions were more variable than the IR regions, and the non-coding regions were more variable than the coding regions.The IR/SSC and IR/LSC junction regions of Triplostegia contained seven genes: rpl2, rpl23, trnN, ndhF, ycf1, trnI, and trnH (Fig. S4).
For both the plastid and nrDNA data, the degree of genetic differentiation (F ST ), was relatively high among the three Triplostegia species (Table 2).F ST was highest between T. grandiflora and the Triplostegia sp.A (0.89533 for plastids, 0.93251 for nrDNA), followed by T. grandiflora vs. T. glandulifera (0.80408 for plastids, 0.78292 for nrDNA), and Triplostegia sp. vs. T. glandulifera (0.77473 for plastids, 0.65421 for nrDNA).However, our phylogenetic analyses, based on plastid genomes, detected no correlation between the phylogenetic relatedness of samples and their geographical distribution on either the same or opposite sides of the Jinsha River, for both T. grandiflora and Triplostegia sp. (Fig. 6).

Species discrimination based on standard DNA barcodes and highly variable cpDNA regions
In tree-based analyses, none of the standard plant DNA barcodes (rbcL, matK, trnH-psbA, and ITS), whether used singly or in combinations, could successfully discriminate all three Triplostegia taxa (Table 3).However, the highly variable cpDNA region rpoB-trnC region alone successfully discriminated all three species with relatively high node support values (100% for T. glandulifera, 87% for T. grandiflora, and 98% for Triplostegia sp.), and the ycf1 gene did the same.The highly variable cpDNA regions ndhF and ndhF-trnN could not distinguish all three Triplostegia species alone or in combination.All other combinations including two to four regions of ndhF, ndhF-trnN, rpoB-trnC and ITS, could successfully discriminate all three Triplostegia taxa except for ndhF + ndhF-trnN and ndhF-trnN + ITS.Particularly, the combination of rpoB-trnC and ITS was able to successfully discriminate all three Triplostegia species with maximum supporting values.Furthermore, all T. grandiflora samples contained an identical 66 bp insertion sequence in their ycf1 gene, while all T. glandulifera samples contained an identical 18 bp insertion sequence in ycf1.Distance-based analyses revealed that any of the three highly variable plastid regions or the ycf1 gene alone successfully distinguished all three Triplostegia species (Table S7).

Species distribution modeling
The AUC values for all models in this study were > 0.99, indicating high model performance (Table S11).Precipitation of the warmest quarter (Bio18) was the most important bioclimatic variable in determining the geographical distribution of all three Triplostegia species, with a particularly strong influence on T. glandulifera.The mean temperature of the coldest quarter (Bio11) was the second most important bioclimatic variable for T. glandulifera, while both the mean temperature of the coldest quarter (Bio11) and isothermality (Bio3) were the next most important variables for both T. grandiflora and Triplostegia sp.A (Fig. S10; Table S12 -14).
The potential suitable habitats for Triplostegia sp.A exhibited similarities with those of T. grandiflora during each time period, with the current predicted range largely confined to the HDM and the Himalaya.However, the potential suitable habitat for T. glandulifera was larger, extending to East Asia (Fig. S11).The projected past suitable habitats for each Triplostegia species were much smaller than their current suitable habitats, especially for T. glandulifera during LIG and LGM.Moreover, all three Triplostegia species are projected to experience pronounced habitat shrinkage by 2090.Under the moderate (SSP2-4.5)and pessimistic (SSP5-8.5)climate     S12; Table S15).
The niche overlap between T. grandiflora and Triplostegia sp. was the largest, followed by that between T. glandulifera and Triplostegia sp., while T. grandiflora and glandulifera showed the smallest niche overlap (Fig. 8; Table S16).

Confirmation of a third species in Triplostegia
Our phylogenetic analyses based on datasets of complete plastid genomes (Fig. 3), plastid CDS (Fig. S5), and highly variable plastid DNA regions, consistently indicate that Triplostegia contains three well-supported monophyletic species: T. glandulifera (BS ML = 100%, PP BI = 1.00 from complete plastid genome data), T. grandiflora (BS ML = 100%, PP BI = 1.00), and an undescribed species Triplostegia sp.A (BS ML = 100%, PP BI = 1.00).Of these, T. glandulifera branched off first, making the other two sister species (Fig. 3).This topology is actually consistent with a previous study [59], which examined fewer accessions and did not distinguish Triplostegia sp.A (their accession numbers of SRS3196660, SRS3196661, and SRS3196663) from T. grandiflora.Hence, molecular phylogenetic analyses may produce accurate topologies but alone cannot with certainly detect cryptic species.Our Neighbor-net analysis of combined plastid and nuclear data likewise indicated the division of Triplostegia into three distinct clusters (Fig. 4).Nuclear data alone did not conflict the plastome-based topology, but samples of T. glandulifera did not form a well-supported monophyletic clade (Fig. S6), most likely due to limited resolution from the small part of the genome sampled.
Triplostegia sp.A is clearly defined by morphology, as well as plastid data.It differs from its closer relative T. grandiflora in seven morphological traits (Table 4), whereas three traits (plant height, petiole length and degree of leaf division) provide consistent differences between all species (Table 4).Ecologically, there is a clear separation by altitude, with sp.A occurring from 1800 to 4342 m, compared to 1800-3200 m for T. grandiflora and 1250-3400 m for T. glandulifera.The wider altitude range of the high altitude sp.A supports the Rapoport's Rule [103], which postulates that species at higher elevations tend to have larger elevation ranges.The differing altitude ranges might also contribute to the differing geographical ranges of the three species (Fig. 1).Although T. grandiflora and T. glandulifera are the most similar pair for altitude range, our ecological niche modeling results indicated that the greatest interspecific niche overlap was between T. grandiflora and sp.A (Fig. 8), indicating a correlation between niche overlap and relatedness, and hence phylogenetic conservatism [104].Hence niche differentiation likely played a significant role in the species diversification of Triplostegia.
Functional trait differences between sp.A and T. grandiflora appear to be consistent with their ecological separation: a higher chlorophyll content in T. grandiflora indicates greater photosynthetic capacity [105], whereas its lower SLA (Fig. S9; Table S9) would normally indicate a resource-stressed environment [88].Its leaves are also sessile and smaller but thicker, and it has a thicker taproot for water and nutrient storage (Table 4, Fig. S8-9; Table S9), consistent with it occupying a warmer and drier habitat.
Although sp. A. occasionally coexists with T. grandiflora in Yunnan where their altitude ranges overlap, we found no morphological intermediates nor other evidence of hybridization.Therefore, they are able to maintain distinct populations even where sympatric.Therefore, based on an integrative examination of molecular, morphological and ecological data, it is clear that sp.A represents an undescribed third species within the genus.
The phylogenetic position of Triplostegia has long been controversial [106,107], but our phylogenetic analysis of Dipsacales based on plastid genomes provides compelling evidence that Triplostegia is a monophyletic genus, sister to a clade comprising Dipsacus, Scabiosa, and Pterocephalus (Fig. 2).This is consistent with previous phylogenetic reconstructions of Dipsacales based on plastid genomes, which sampled fewer Triplostegia individuals [53,55].

Geography, climate and causes of speciation
Recent rapid speciation is a feature of the Hengduan Mountains Region (HDM) [48,49], thought to be driven by the uplift of the Hengduan Mountains and the late Miocene to Pliocene intensification of the Asian monsoon [46-48, 108, 109].Such rapid uplifts create new niches at high altitude which newly formed species may inhabit, e.g. the homoploid hybrid species Pinus densata [110,111].Triplostegia sp.A and T. grandiflora represent a high/low altitude species pair in the HDM region, similar to Roscoea humeana and R. cautleoides [112].The altitude ranges of T. glandulifera and T. grandiflora are fairly similar (Table 4), which indicates that the most recent common ancestor (MRCA) of the genus, and also the sp.A-T.grandiflora species pair, probably occupied lower altitudes.If so, the speciation event that produced sp.A might have involved an incursion into colder and/ or higher altitude conditions.The timing of this split, around 1 million years ago, indicates that it might have come about due to Quaternary climate fluctuations, with one lineage adapting to cooling conditions coming out of an interglacial while the other moved to lower altitudes or latitudes, tracking the climate.
Large rivers may act as barriers to gene flow [113][114][115][116], with examples within China for animals [117,118], fungi [119], and plants [120,121].However, both T. grandiflora and Triplostegia sp.A occur on both sides of the steep-sided Jinsha River gorge (Fig. 6), indicating that this river gorge is easily traversed, as found for Roscoea [112].We observed that the glandular pubescent fruit of Triplostegia [52] can easily attach to animal fur and human clothing, potentially facilitating dispersal across the river barrier.River gorges serve as strong barriers for Vitex negundo [120] and Parrotia subaequalis [121], both of which have seeds apparently dispersed by gravity, whereas certain species in the Amazon region of South America are not affected by river barriers to dispersal [116].These results suggest that the barrier effect of river gorges largely depends on the specific dispersal traits of plants [115].A more comprehensive analysis of dispersal and gene flow in Triplostegia will require the sampling of more than one individual per population, however.

Plastid genome features and nucleotide diversity
Newly originated species typically have a narrow geographical range and lower levels of genetic diversity compared to more ancient and widespread congeners [122].Consistent with this, T. glandulifera diverged ~ 7. 94 Ma, has the widest distribution range (Fig. 1), and according to Neighbour-Net analysis (Fig. 4) and nucleotide diversity (π) exhibits much higher levels of intraspecific genetic variation than that the other two species, which diverged from each other ~ 1.05 Ma.Likewise plastid genome size varied by 1215 bp within T. glandulifera (Table 1, Fig. S1), due to expansion and contraction of the IR/SC boundary regions (Fig. S4), which is a major mechanism underlying plastid genome size variation in plants [28,39,123].Length variation within T. grandiflora was 68 bp, and it had the least genetic variation (π) in general consistent with this species having the smallest geographical range of the three (Fig. 1), whereas Triplostegia sp.A was intermediate for both range and genetic variation.Otherwise, Triplostegia had a high level of conservation in plastid genome structure, gene order, gene content, and genome size, consistent with previous work on the genus [59] and family (Caprifoliaceae) [54,55,[124][125][126].

DNA barcodes for species discrimination
The standard plant DNA barcodes, including rbcL, matK, trnH-psbA, and ITS [23], have been widely used in fields such as community ecology [78,127], invasive species management [128], and forensic identification [129,130].But they are not always effective, especially for taxa that have recently diverged or possess complex evolutionary history [26,27], and none these, either singly or in combinations, were able to discriminate all three Triplostegia species.However, the complete plastid genomes and the highly variable cpDNA region rpoB-trnC alone successfully discriminated all three Triplostegia species, respectively with high bootstrap values (Table 3; Table S7).The rpoB-trnC locus is highly variable in the plastid genomes of other plant lineages, such as Papaver [131], Dioscorea [132], and Debregeasia [133].In addition, the ycf1 gene, which is highly variable in flowering plants [77], contained species-specific insertions of 66 bp for T. grandiflora and 18 bp for T. glandulifera, making it a powerful DNA barcode that could discriminate all three Triplostegia species (Table 3; Table S7).The specific function of the ycf1 gene remains to be explored [134,135], but it or other plastome variation could be linked to the differences in leaf chlorophyll content between T. grandiflora and Triplostegia sp.A, and other divergences in photosynthesis-related functions.Therefore, either rpoB-trnC or ycf1 can be used as a taxon-specific DNA barcode for discriminating Triplostegia species.Our results highlight the potential of developing taxon-specific barcodes for recently diverged taxa based on plastid genome data, which has been successfully applied in many other plant taxa [42,136].

Fig. 1
Fig. 1 Occurrences of Triplostegia glandulifera (yellow dot), Triplostegia sp.A (red triangles), and T. grandiflora (green squares).The areas circled with yellow, red, and green lines are the geographical distribution areas of T. glandulifera, Triplostegia sp., and T. grandiflora, respectively.(The map is created by authors using ArcGIS software)

Fig. 2
Fig.2Phylogenetic relationships of Dipsacales constructed using RAxML based on complete chloroplast genome sequences.The maximum likelihood (ML) tree is presented, with maximum likelihood bootstrap support values (BS) and Bayesian inference posterior probability (PP) values given for each node.Nodes with a '*' symbol represent nodes that received maximum support from ML or BI analysis ('*': 100% or 1.0).Nodes without values represent maximal support in both ML and BI methods (BS ML = 100%, PP BI = 1.00)

Fig. 3
Fig.3 Phylogenetic relationships of 33 samples of Triplostegia species based on complete chloroplast genome sequences.The phylogenetic tree was constructed using both maximum likelihood (ML) and Bayesian inference (BI) methods.The maximum likelihood (ML) tree is presented.Numbers along the branch indicate bootstrap support values from ML analysis (based on 1000 replicates) and Bayesian posterior probabilities from BI analysis

Fig. 4 a 2 Fig. 6
Fig. 4 a Unrooted neighbour-joining (NJ) tree of Triplostegia based on the P-distance calculated from three highly variable plastid DNA regions (ndhf, ndhf-trnN, rpoB-trnC) and nuclear ITS sequences.b Neighbor-net analysis of Triplostegia based on complete chloroplast genome and nrNDA sequences.Bootstrap values (based on 1000 replicates) are indicated along the branches for each clusters

Fig. 7 Fig. 8
Fig. 7 Principal component analysis (PCA) of 10 morphological traits of the three Triplostegia species.Morphological trait data were collected by measuring specimens of Triplostegia

Table 1
General characteristics of chloroplast genome of Triplostegia species IR Inverted repeat, LSC Long single-copy, SSC Short single-copy

Table 3
Tree-based species discrimination rates of Triplostegia by using highly variable plastid DNA regions, ndhF, ndhF-trnN, rpoB-trnC, and standard plant DNA barcodes, rbcL, matK, and ITS singly or in combinations n.d., species failed to form a monophyletic clade with bootstrap value ≥70% and thus assigned "not discriminated, n.d.".The standard plant DNA barcode trnH-psbA was not included in our tree-based analyses due to an insufficient number of informative sites for species discrimination

Table 4
Differences in elevation and morphological traits difference among the three Triplostegia species